Please note: that This is just a quick look of me into datasets. Maybe, I will do more later.
As I know from previous notebooks and dataset descriptions. Olist is just a ecommerce solution, they just provide solutions for many sellers and buyers to gain the surplus.
I am from VietNam, which has many active and competitive marketplace platforms. From my vision in VietNam ecommerce. In basic, The market is divided into 2 main types:
VietNam start-up companies that operate based on the funding by themselves ( You can search Tiki and Sendo to know more their strategies and special operations)
Foreign companies that operate based on the funding from their mother company ( You can search Shopee and Lazada to know more their strategies and special operations)
It is probably not relevant to Olist, but I just want to provide you some other ways that how ecommerce are manipulated strategy to find a new way to survive.
In last year, many ecommerce companies have given up and filed a bankrupcy because of hot competitive marketplace. ** Ecommerce is burning money **
Tiki is a very special ecommerce in VietNam that has a 2h last mile delivery for Tikinow (Tiki has hired some people from Amazon and Facebook). Customers segment of Tiki is mainly located in City. To be honest, I wish I could work in Tiki environment because of their very special strategy and vision.
Sendo is known as VietNam start-up (like as Tiki). They focus on customers located in countryside and then widen to city
In recently, There is some information that Tiki and Sendo are prepared to merge.
Lazada ans Shopee: Lazada is from giant Alibaba in China and Shopee is from giant SEA located in Singapore. They also burn many and many money to gain the marketshare.
Top 1 of the number of suffering is Shopee Top 2 of the number of suffering is Lazada Top 3 of the number of suffering is Tiki Top 4 of the number of suffering is Sendo
Unlikely VietNam companies, Olist operates very differently. They just provide solution and platform for their customers (sellers and buyers). I am surprised that the leadtime for shipping product is mainly 12 days. Moreover there is some cases that are 42 days.
- There is no stock, warehouse
- There is no shipping product
- Don't buy products
To know more Olist. I suggest that you should click on Another_notebook. The author mention very well how Olist oprate it model. On the other hand, I learn from this notebook very much.
Here are some techniques that I use to standardize datasets for convenient purposes. Likewise, packages are used up to help me to intepret datasets.
str_to_title() function to convert string into right formatgridExtra, but it is very convenient for manipulatingI don’t know why they just store in prefix-zip-code in datasets. Because This is not enough elements to determine each geolocation for each customers. To visualize geolocation, I just calculate the mean of longitude and latitude to have a general view.
- customer_id
- customer_unique_id
- customer_zip_code_prefix
- customer_city
- customer_state
| customer_id | customer_unique_id | customer_zip_code_prefix | customer_city | customer_state |
|---|---|---|---|---|
| 06b8999e2fba1a1fbc88172c00ba8bc7 | 861eff4711a542e4b93843c6dd7febb0 | 14409 | Franca | SP |
| 18955e83d337fd6b2def6b18a428ac77 | 290c77bc529b7ac935b93aa66c333dc3 | 9790 | Sao Bernardo Do Campo | SP |
| 4e7b3e00288586ebd08712fdd0374a03 | 060e732b5b29e8181a18229c7b0b2b5e | 1151 | Sao Paulo | SP |
| b2b6027bc5c5109e529d4dc6358b12c3 | 259dac757896d24d7702b9acbbff3f3c | 8775 | Mogi Das Cruzes | SP |
| 4f2d8ab171c80ec8364f7c12e35b23ad | 345ecd01c38d18a9036ed96c73b8d066 | 13056 | Campinas | SP |
| 879864dab9bc3047522c92c82e1212b8 | 4c93744516667ad3b8f1fb645a3116a4 | 89254 | Jaragua Do Sul | SC |
| fd826e7cf63160e536e0908c76c3f441 | addec96d2e059c80c30fe6871d30d177 | 4534 | Sao Paulo | SP |
| 5e274e7a0c3809e14aba7ad5aae0d407 | 57b2a98a409812fe9618067b6b8ebe4f | 35182 | Timoteo | MG |
| 5adf08e34b2e993982a47070956c5c65 | 1175e95fb47ddff9de6b2b06188f7e0d | 81560 | Curitiba | PR |
| 4b7139f34592b3a31687243a302fa75b | 9afe194fb833f79e300e37e580171f22 | 30575 | Belo Horizonte | MG |
There are some typo in processing text, but it’s more popular in less frequent geolocation_city
- geolocation_zip_code_prefix
- geolocation_lat
- geolocation_lng
- geolocation_city
- geolocation_state
| geolocation_zip_code_prefix | geolocation_lat | geolocation_lng | geolocation_city | geolocation_state |
|---|---|---|---|---|
| 1037 | -23.54562 | -46.63929 | Sao Paulo | SP |
| 1046 | -23.54608 | -46.64482 | Sao Paulo | SP |
| 1046 | -23.54613 | -46.64295 | Sao Paulo | SP |
| 1041 | -23.54439 | -46.63950 | Sao Paulo | SP |
| 1035 | -23.54158 | -46.64161 | Sao Paulo | SP |
| 1012 | -23.54776 | -46.63536 | Sã£O Paulo | SP |
| 1047 | -23.54627 | -46.64123 | Sao Paulo | SP |
| 1013 | -23.54692 | -46.63426 | Sao Paulo | SP |
| 1029 | -23.54377 | -46.63428 | Sao Paulo | SP |
| 1011 | -23.54764 | -46.63603 | Sao Paulo | SP |
There are some typo in processing text, but it’s more popular in less frequent geolocation_city
- order_id
- customer_id
- order_status
- order_purchase_timestamp
- order_approved_at
- order_delivered_carrier_date
- order_delivered_customer_date
- order_estimated_delivery_date
| order_id | customer_id | order_status | order_purchase_timestamp | order_approved_at | order_delivered_carrier_date | order_delivered_customer_date | order_estimated_delivery_date |
|---|---|---|---|---|---|---|---|
| e481f51cbdc54678b7cc49136f2d6af7 | 9ef432eb6251297304e76186b10a928d | delivered | NA | 2017-10-02 11:07:15 | 2017-10-04 19:55:00 | 2017-10-10 21:25:13 | 2017-10-18 00:00:00 |
| 53cdb2fc8bc7dce0b6741e2150273451 | b0830fb4747a6c6d20dea0b8c802d7ef | delivered | NA | 2018-07-26 03:24:27 | 2018-07-26 14:31:00 | 2018-08-07 15:27:45 | 2018-08-13 00:00:00 |
| 47770eb9100c2d0c44946d9cf07ec65d | 41ce2a54c0b03bf3443c3d931a367089 | delivered | NA | 2018-08-08 08:55:23 | 2018-08-08 13:50:00 | 2018-08-17 18:06:29 | 2018-09-04 00:00:00 |
| 949d5b44dbf5de918fe9c16f97b45f8a | f88197465ea7920adcdbec7375364d82 | delivered | NA | 2017-11-18 19:45:59 | 2017-11-22 13:39:59 | 2017-12-02 00:28:42 | 2017-12-15 00:00:00 |
| ad21c59c0840e6cb83a9ceb5573f8159 | 8ab97904e6daea8866dbdbc4fb7aad2c | delivered | NA | 2018-02-13 22:20:29 | 2018-02-14 19:46:34 | 2018-02-16 18:17:02 | 2018-02-26 00:00:00 |
| a4591c265e18cb1dcee52889e2d8acc3 | 503740e9ca751ccdda7ba28e9ab8f608 | delivered | NA | 2017-07-09 22:10:13 | 2017-07-11 14:58:04 | 2017-07-26 10:57:55 | 2017-08-01 00:00:00 |
| 136cce7faa42fdb2cefd53fdc79a6098 | ed0271e0b7da060a393796590e7b737a | invoiced | NA | 2017-04-13 13:25:17 | 2017-05-09 00:00:00 | ||
| 6514b8ad8028c9f2cc2374ded245783f | 9bdf08b4b3b52b5526ff42d37d47f222 | delivered | 1970-01-01 07:00:30 | 2017-05-16 13:22:11 | 2017-05-22 10:07:46 | 2017-05-26 12:55:51 | 2017-06-07 00:00:00 |
| 76c6e866289321a7c93b82b54852dc33 | f54a9f0e6b351c431402b8461ea51999 | delivered | NA | 2017-01-25 02:50:47 | 2017-01-26 14:16:31 | 2017-02-02 14:08:10 | 2017-03-06 00:00:00 |
| e69bfb5eb88e0ed6a785585b27e16dbf | 31ad1d1b63eb9962463f764d4e6e0c9d | delivered | NA | 2017-07-29 12:05:32 | 2017-08-10 19:45:24 | 2017-08-16 17:14:30 | 2017-08-23 00:00:00 |
I don’t understand Is order_item_id like as order quantity?
- order_id
- order_item_id
- product_id
- seller_id
- shipping_limit_date
- price
- freight_value
| order_id | order_item_id | product_id | seller_id | shipping_limit_date | price | freight_value |
|---|---|---|---|---|---|---|
| 00010242fe8c5a6d1ba2dd792cb16214 | 1 | 4244733e06e7ecb4970a6e2683c13e61 | 48436dade18ac8b2bce089ec2a041202 | NA | 58.90 | 13.29 |
| 00018f77f2f0320c557190d7a144bdd3 | 1 | e5f2d52b802189ee658865ca93d83a8f | dd7ddc04e1b6c2c614352b383efe2d36 | 1970-01-01 07:00:13 | 239.90 | 19.93 |
| 000229ec398224ef6ca0657da4fc703e | 1 | c777355d18b72b67abbeef9df44fd0fd | 5b51032eddd242adc84c38acab88f23d | NA | 199.00 | 17.87 |
| 00024acbcdf0a6daa1e931b038114c75 | 1 | 7634da152a4610f1595efa32f14722fc | 9d7a1d34a5052409006425275ba1c2b4 | 1970-01-01 07:00:18 | 12.99 | 12.79 |
| 00042b26cf59d7ce69dfabb4e55b4fd9 | 1 | ac6c3623068f30de03045865e4e10089 | df560393f3a51e74553ab94004ba5c87 | NA | 199.90 | 18.14 |
| 00048cc3ae777c65dbb7d2a0634bc1ea | 1 | ef92defde845ab8450f9d70c526ef70f | 6426d21aca402a131fc0a5d0960a3c90 | NA | 21.90 | 12.69 |
| 00054e8431b9d7675808bcb819fb4a32 | 1 | 8d4f2bb7e93e6710a28f34fa83ee7d28 | 7040e82f899a04d1b434b795a43b4617 | 1970-01-01 07:00:31 | 19.90 | 11.85 |
| 000576fe39319847cbb9d288c5617fa6 | 1 | 557d850972a7d6f792fd18ae1400d9b6 | 5996cddab893a4652a15592fb58ab8db | NA | 810.00 | 70.75 |
| 0005a1a1728c9d785b8e2b08b904576c | 1 | 310ae3c140ff94b03219ad0adc3c778f | a416b6a846a11724393025641d4edd5e | NA | 145.95 | 11.65 |
| 0005f50442cb953dcd1d21e1fb923495 | 1 | 4535b0e1091c278dfd193e5a1d63b39f | ba143b05f0110f0dc71ad71b4466ce92 | 1970-01-01 07:00:56 | 53.99 | 11.40 |
I don’t understand payment_sequential like as payment quantity?
- order_id
- payment_sequential
- payment_type
- payment_installments
- payment_value
| order_id | payment_sequential | payment_type | payment_installments | payment_value |
|---|---|---|---|---|
| b81ef226f3fe1789b1e8b2acac839d17 | 1 | Credit card | 8 | 99.33 |
| a9810da82917af2d9aefd1278f1dcfa0 | 1 | Credit card | 1 | 24.39 |
| 25e8ea4e93396b6fa0d3dd708e76c1bd | 1 | Credit card | 1 | 65.71 |
| ba78997921bbcdc1373bb41e913ab953 | 1 | Credit card | 8 | 107.78 |
| 42fdf880ba16b47b59251dd489d4441a | 1 | Credit card | 2 | 128.45 |
| 298fcdf1f73eb413e4d26d01b25bc1cd | 1 | Credit card | 2 | 96.12 |
| 771ee386b001f06208a7419e4fc1bbd7 | 1 | Credit card | 1 | 81.16 |
| 3d7239c394a212faae122962df514ac7 | 1 | Credit card | 3 | 51.84 |
| 1f78449c87a54faf9e96e88ba1491fa9 | 1 | Credit card | 6 | 341.09 |
| 0573b5e23cbd798006520e1d5b4c6714 | 1 | Boleto | 1 | 51.95 |
I don’t know Portugese, so the review comment is quite difficult for me to understand. But my first think approach is use sentiment
- review_id
- order_id
- review_score
- review_comment_title
- review_comment_message
- review_creation_date
- review_answer_timestamp
| score | comment | creation_date | answer |
|---|---|---|---|
| 4 | 1970-01-01 07:00:00 | 2018-01-18 21:46:59 | |
| 5 | 1970-01-01 07:00:00 | 2018-03-11 03:05:13 | |
| 5 | 1970-01-01 07:00:00 | 2018-02-18 14:36:24 | |
| 5 | Recebi bem antes do prazo estipulado. | 1970-01-01 07:00:00 | 2017-04-21 22:02:06 |
| 5 | Parabéns lojas lannister adorei comprar pela Internet seguro e prático Parabéns a todos feliz Páscoa | 1970-01-01 07:00:00 | 2018-03-02 10:26:53 |
| 1 | 1970-01-01 07:00:00 | 2018-04-16 00:39:37 | |
| 5 | 1970-01-01 07:00:00 | 2017-07-18 19:30:34 | |
| 5 | 1970-01-01 07:00:00 | 2018-08-14 21:36:06 | |
| 5 | 1970-01-01 07:00:00 | 2017-05-18 12:05:37 | |
| 4 | aparelho eficiente. no site a marca do aparelho esta impresso como 3desinfector e ao chegar esta com outro nome…atualizar com a marca correta uma vez que é o mesmo aparelho | 1970-01-01 07:00:00 | 2018-05-23 16:45:47 |
- seller_city
- seller_id
- seller_zip_code_prefix
- seller_state
| seller_city | seller_id | seller_zip_code_prefix | seller_state |
|---|---|---|---|
| Campinas | 3442f8959a84dea7ee197c632cb2df15 | 13023 | SP |
| Mogi Guacu | d1b65fc7debc3361ea86b5f14c68d2e2 | 13844 | SP |
| Rio De Janeiro | ce3ad9de960102d0677a81f5d0bb7b2d | 20031 | RJ |
| Sao Paulo | c0f3eea2e14555b6faeea3dd58c1b1c3 | 4195 | SP |
| Braganca Paulista | 51a04a8a6bdcb23deccc82b0b80742cf | 12914 | SP |
| Rio De Janeiro | c240c4061717ac1806ae6ee72be3533b | 20920 | RJ |
| Brejao | e49c26c3edfa46d227d5121a6b6e4d37 | 55325 | PE |
| Penapolis | 1b938a7ec6ac5061a66a3766e0e75f90 | 16304 | SP |
| Sao Paulo | 768a86e36ad6aae3d03ee3c6433d61df | 1529 | SP |
| Curitiba | ccc4bbb5f32a6ab2b7066a4130f114e3 | 80310 | PR |
- product_category_name
- weight
- quantity
- height
- widths
| product_category_name | weight | quantity | height | width |
|---|---|---|---|---|
| Perfumaria | 225 | 1 | 10 | 14 |
| Artes | 1000 | 1 | 18 | 20 |
| Esporte_lazer | 154 | 1 | 9 | 15 |
| Bebes | 371 | 1 | 4 | 26 |
| Utilidades_domesticas | 625 | 4 | 17 | 13 |
| Instrumentos_musicais | 200 | 1 | 5 | 11 |
| Cool_stuff | 18350 | 4 | 24 | 44 |
| Moveis_decoracao | 900 | 2 | 8 | 40 |
| Eletrodomesticos | 400 | 1 | 13 | 17 |
| Brinquedos | 600 | 1 | 10 | 12 |
- product_category_name
- product_category_name_english
| product_category_name | product_category_name_english |
|---|---|
| beleza_saude | Health_beauty |
| informatica_acessorios | Computers_accessories |
| automotivo | Auto |
| cama_mesa_banho | Bed_bath_table |
| moveis_decoracao | Furniture_decor |
| esporte_lazer | Sports_leisure |
| perfumaria | Perfumery |
| utilidades_domesticas | Housewares |
| telefonia | Telephony |
| relogios_presentes | Watches_gifts |
## [1] "Cities"
## [1] "Cities that have equal and more than 100 customers"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"
## [1] "States"
## [1] "There are 27 states in Brazil"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"
## [1] "Duplicated recoreds, Because of the first time in cleaning mistakes, I can't visualize it within city. Likewise, State scope is a potential choice"
##
## FALSE
## 99441
## [1] "Yes"
## [1] "Cities"
## [1] "Cities that have equal and more than 100 sellers"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"
## [1] "States"
## [1] "There are 27 states in Brazil"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"